语言模型在需要自然语言理解的各种任务上取得了非凡的表现。然而,最先进的模型通常在需要定量推理的任务上挣扎,例如在大学一级解决数学,科学和工程问题。为了帮助缩小这一差距,我们介绍了Minerva,Minerva是一种在一般自然语言数据上鉴定的大型语言模型,并进一步培训了技术内容。该模型在不使用外部工具的情况下实现了技术基准测试的最新性能。我们还评估了我们在需要定量推理的物理学,生物学,化学,经济学和其他科学方面的200多个本科生问题上评估我们的模型,并发现该模型可以正确回答其中几乎三分之一。
translated by 谷歌翻译
我们介绍了块状变压器,该变压器以序列的反复方式应用变压器层,并且相对于序列长度具有线性复杂性。我们的复发单元在训练过程中在代币的块而不是单个令牌上运行,并利用块内并行计算,以便有效利用加速器硬件。单元本身非常简单。它仅仅是一个变压器层:它使用自我注意事项和交叉注意力来有效计算大量状态向量和令牌上的复发函数。我们的设计部分受到LSTM单元的启发,它使用LSTM风格的大门,但它可以将典型的LSTM单元缩放为几个数量级。我们的复发实现在计算时间和参数计数中都具有相同的成本作为传统的变压器层,但是在很长的序列中,语言建模任务中的语言建模任务的困惑极大地改善了。我们的模型比远程变压器XL基线的表现宽大,同时运行的速度是两倍。我们证明了它在PG19(书籍),Arxiv论文和GitHub源代码上的有效性。我们的代码已发布为开​​源。
translated by 谷歌翻译
神经网络(NN)的重量矩阵(WM)是其程序。许多传统NN的程序是通过梯度下降中的某些错误函数中学到的,然后保持固定。但是,在运行时可以继续迅速修改自身的WM。原则上,这样的NN可以学习元学习,并从递归自我改善的意义上学习meta-meta-learn来学习,等等。自从90年代以来,已经提出了NN架构可能能够实施这种行为的架构,但几乎没有实践研究。在这里,我们基于快速重量程序员和密切相关的线性变压器的最新成功进行重新审视。我们提出了一个可扩展的自我参照WM(SRWM),该WM(SRWM)学会使用外部产品和Delta Update规则来修改自身。我们通过程序生成的游戏环境评估了有监督的少数学习和多任务增强学习中的SRWM。我们的实验证明了拟议的SRWM的实际适用性和竞争性能。我们的代码是公开的。
translated by 谷歌翻译
我们与最近发布的狂野基准分享我们的经验,这是一个致力于开发模型和培训策略的十个数据集的集合,这些策略对域班较强。几个实验产生了几个批判性观察,我们认为对任何未来的野外工作都是普遍的兴趣。我们的研究侧重于两个数据集:IWILDCAM和FMOW。我们展示(1)对每个评估度量进行单独的交叉验证对于两个数据集来说至关重要,(2)验证和测试性能之间的相关性可能使IWIndCAM的模型开发难以困难,(3)超级培训的次要变化困难 - 参数通过相对较大的边缘(主要是FMOW)来改善基线,(4)某些域和某些目标标签之间存在强烈的相关性(主要是IWINDCAM)之间存在强烈的相关性。据我们所知,尽管有明显的重要性,但这些数据集上没有关于这些观察结果的工作。我们的代码是公开的。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
批处理过程显示了几种可变性来源,从原材料的特性到制造过程中不同事件期间变化的初始和不断发展的条件。在本章中,我们将用一个工业示例说明如何使用机器学习来减少这种明显的数据,同时维护过程工程师的相关信息。将提出两个常见的用例:1)自动分析以快速找到批处理过程中的相关性,以及2)轨迹分析以监视和识别异常批次,从而导致过程控制改进。
translated by 谷歌翻译
近年来,在平衡(超级)图分配算法的设计和评估中取得了重大进展。我们调查了过去十年的实用算法的趋势,用于平衡(超级)图形分区以及未来的研究方向。我们的工作是对先前有关该主题的调查的更新。特别是,该调查还通过涵盖了超图形分区和流算法来扩展先前的调查,并额外关注并行算法。
translated by 谷歌翻译
预测通常是概率;例如,明天可能是降水的预测,但机会只有30%。鉴于这种概率预测以及实际结果,“可靠性图”有助于检测和诊断预测和结果之间的统计学意义差异 - 所谓的“错误校准”。规范可靠性图直方图预测的观察到的和期望值;用软内核密度估计替换硬直方图框架是另一种常见做法。但是,哪些垃圾箱或核的宽度最好?观察到的和期望值之间累积差异的图在很大程度上避免了这个问题,通过直接显示错误的校准作为图形的斜线斜率。即使割线线的恒定偏移无关紧要,斜率也很容易被定量精度感知。无需箱或执行核密度估计。现有的错误校准的标准指标每个总结了一个可靠性图作为单个标量统计量。累积图自然会导致标量指标,以使累积差异偏离零的图形;良好的校准对应于一个水平的平坦图,该图几乎不偏离零。累积方法目前是非常规的,但提供了许多有利的统计属性,可以通过数学理论保证,并以严格的证明和说明性的数值示例支持。特别是,不可避免地,基于嵌入或内核密度估计的指标必须权衡统计置信度,以使解决变化的能力作为预测概率的函数,反之亦然。扩大垃圾箱或内核平均噪声,同时放弃一些分辨能力。缩小垃圾箱或内核会增强分辨力,同时平均消除那么多噪音。
translated by 谷歌翻译
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
translated by 谷歌翻译